We propose the first joint audio-video generation framework that brings engaging watching and listening experiences simultaneously, towards high-quality realistic videos. To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i.e., MM-Diffusion), with two-coupled denoising autoencoders. In contrast to existing single-modal diffusion models, MM-Diffusion consists of a sequential multi-modal U-Net for a joint denoising process by design. Two subnets for audio and video learn to gradually generate aligned audio-video pairs from Gaussian noises. To ensure semantic consistency across modalities, we propose a novel random-shift based attention block bridging over the two subnets, which enables efficient cross-modal alignment, and thus reinforces the audio-video fidelity for each other. Extensive experiments show superior results in unconditional audio-video generation, and zero-shot conditional tasks (e.g., video-to-audio). In particular, we achieve the best FVD and FAD on Landscape and AIST++ dancing datasets. Turing tests of 10k votes further demonstrate dominant preferences for our model. The code and pre-trained models can be downloaded at https://github.com/researchmm/MM-Diffusion.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
从理论上讲,无监督的域适应性(UDA)的成功在很大程度上取决于域间隙估计。但是,对于无源UDA,在适应过程中无法访问源域数据,这在测量域间隙方面构成了巨大挑战。在本文中,我们建议使用许多分类器来学习源域决策边界,即使两个域数据无法同时访问,它也提供了域间隙的更紧密的上限。对源模型进行了训练,可以推开每对分类器,同时确保决策边界的正确性。从这个意义上讲,我们的许多分类器模型尽可能将源不同类别分开,从而诱导目标域中许多分类器的最大分歧,从而最大程度地提高了可转移的源域知识。为了进行适应,源模型适应最大化分类器对之间的一致性。因此,目标特征从决策范围中推开。在UDA的几个数据集上进行的实验表明,我们的方法在免费的UDA方法中实现了最先进的性能,甚至可以竞争为可用的UDA方法竞争。
translated by 谷歌翻译
目标为可以处理多答题问题的临床问答(QA)系统的开发和评估创建数据集。我们利用2018年国家NLP临床挑战(N2C2)语料库的注释关系来产生QA数据集。 1-0和1-o-n药物 - 理性关系形成了不可批售和多答案的条目,它代表了现有临床QA数据集缺乏的具有挑战性的情景。结果结果rxwhyqa dataSet包含91,440个QA条目,其中一半是未签发的,并且应答的21%(n = 19,269)需要多个答案。数据集符合社区审查的斯坦福问题应答DataSet(Squad)格式。讨论RXWhyQA对于比较需要处理零和多答案挑战的不同系统非常有用,要求对误报和假阴性答案的双重缓解。结论我们创建并共用了一个临床QA数据集,重点是多答题问题,以代表真实世界的情景。
translated by 谷歌翻译
近年来,图形神经网络(GNNS)在不同的现实应用中表现出卓越的性能。为了提高模型容量,除了设计聚合运作,GNN拓扑设计也非常重要。一般来说,有两个主流GNN拓扑设计方式。第一个是堆叠聚合操作以获得更高级别的功能,但随着网络更深的方式,易于进行性能下降。其次,在每个层中使用多聚合操作,该层在本地邻居提供足够和独立的特征提取阶段,同时获得更高级别的信息昂贵。为了享受减轻这两个方式的相应缺陷的同时享受福利,我们学会在一个新颖的特征融合透视中设计GNN的拓扑,这些融合透视中被称为F $ ^ 2 $ GNN。具体而言,我们在设计GNN拓扑中提供了一个特征融合视角,提出了一种新颖的框架,以统一现有的拓扑设计,具有特征选择和融合策略。然后,我们在统一框架之上开发一个神经结构搜索方法,该方法包含在搜索空间中的一组选择和融合操作以及改进的可微分搜索算法。八个现实数据集的性能增益展示了F $ ^ 2 $ GNN的有效性。我们进一步开展实验,以证明F $ ^ 2 $ GNN可以通过自适应使用不同程度的特征来缓解现有GNN拓扑设计方式的缺陷,同时提高模型容量,同时减轻了现有的GNN拓扑设计方式的缺陷,特别是缓解过平滑问题。
translated by 谷歌翻译
近年来,图形神经网络(GNNS)在现实世界数据集上对不同应用的不同应用表现出卓越的性能。为了提高模型能力并减轻过平滑问题,提出了几种方法通过层面连接来掺入中间层。但是,由于具有高度多样化的图形类型,现有方法的性能因不同的图形而异,导致需要数据特定的层面连接方法。为了解决这个问题,我们提出了一种基于神经结构搜索(NAS)的新颖框架LLC(学习层面连接),以学习GNN中中间层之间的自适应连接。 LLC包含一个新颖的搜索空间,由3种类型的块和学习连接以及一个可分辨率搜索过程组成,以实现有效的搜索过程。对五个现实数据集进行了广泛的实验,结果表明,搜索的层面连接不仅可以提高性能,而且还可以缓解过平滑的问题。
translated by 谷歌翻译
过去几年的技术创新的巨大浪潮,标志着AI技术的进展,是深刻的重塑行业和社会。然而,在路上,一个关键的挑战等待着我们,即我们满足快速增长的情景的能力的能力受到收购培训数据的成本的严重限制。由于主流学习范式的局限性,这一困难的局面是基于主流学习范式的局限性:我们需要根据大量注释的数据以及通常从头来训练每个新场景的新模型。在解决这一基本问题时,我们超越并开发一个名为实习生的新学习范式。通过在多个阶段的来自多个来源的监控信号学习,培训的模型将产生强大的相互性。我们在26个众所周知的数据集中评估我们的模型,该数据集涵盖计算机视觉中的四类任务。在大多数情况下,我们的模型仅适用于目标域中的培训数据的10%,始终以完整的数据培训的对应物,通常由显着的边距。这是一个重要前景的重要一步,其中具有一般视觉能力的这种模型可以大大降低对数据的依赖,从而加速通过AI技术的采用。此外,围绕我们的新范式旋转,我们还介绍了一个新的数据系统,新的架构和新的基准,以及一起形成一般愿景生态系统,以开放和包容性的方式支持其未来的发展。
translated by 谷歌翻译
虽然我们注意临床自然语言处理(NLP)的最新进展,但我们可以注意到临床和翻译研究界的一些抵抗,因为透明度,可解释性和可用性有限,采用NLP模型。在这项研究中,我们提出了一种开放的自然语言处理开发框架。我们通过实施NLP算法为国家Covid队列协作(N3C)进行了评估。基于Covid-19相关临床笔记的信息提取的利益,我们的工作包括1)使用Covid-19标志和症状作为用例的开放数据注释过程,2)一个社区驱动的规则集合平台,3)合成文本数据生成工作流程,用于生成信息提取任务的文本而不涉及人为受试者。 Corpora来自来自三个不同机构的文本(Mayo Clinic,肯塔基州大学,明尼苏达大学)。用单个机构(Mayo)规则集进行了金标准注释。这导致了0.876,0.706和0.694的F-Scors分别用于Mayo,Minnesota和肯塔基测试数据集。作为N3C NLP子群体的联盟努力的研究表明,创建联邦NLP算法开发和基准测试平台的可行性,以增强多机构临床NLP研究和采用。虽然我们在这项工作中使用Covid-19作为用例,但我们的框架足以适用于临床NLP的其他兴趣领域。
translated by 谷歌翻译
许多现代机器学习算法,例如生成的对抗网络(GANS)和对抗性培训可以制定为最低限度优化。梯度下降上升(GDA)是由于其简单性导致的最常用的算法。但是,GDA可以收敛到非最佳Minimax点。我们提出了一个新的最低限度优化框架GDA-AM,将GDadynamics视为固定点迭代,并使用Anderson混合来解决局部imemax。它解决了同时GDA的发散问题加速了交替GDA的收敛性。我们从理论上显示了该算法可以在温和条件下实现Bilinear问题的全局收敛性。我们还经验证明GDA-AMSOLVES各种极少问题,并改善了几个数据集的GaN训练
translated by 谷歌翻译
Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.
translated by 谷歌翻译